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ABSTRACT 

We  introduce  a  novel  method  for  selecting  and  con¬ 
trolling  smart  appliances  in  physical  spaces  through 
a  head-worn  computing  device  with  near-eye  display 
and  wireless  communication.  We  augment  a  commer¬ 
cial  wearable  computing  device,  Google  Glass,  with  a 
narrow-beam  IR  emitter  for  this  purpose.  This  config¬ 
uration  yields  a  usable  beam  width  of  2  to  4  feet  (60  to 
120cm)  for  targeting  at  room  scale.  We  describe  a  dis¬ 
ambiguation  technique  if  infrared  targeting  hits  multi¬ 
ple  targets  simultaneously.  A  target  acquisition  study 
with  14  participants  shows  that  selection  using  head 
orientation  with  our  device  outperforms  list  selection 
on  a  wearable  device.  We  also  report  qualitative  data 
from  using  our  device  to  control  multiple  appliances  in 
a  smart  home  scenario. 

Author  Keywords 
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INTRODUCTION 

Increasingly,  devices  and  services  in  our  built  environ¬ 
ment  are  networked  and  can  be  controlled  remotely. 
The  proliferation  of  smart,  controllable  devices  such  as 
intelligent  lighting,  AV  equipment,  HVAC  systems,  or 
kitchen  appliances  raises  the  question  of  how  to  best  in¬ 
teract  with  them. 

Today,  commercial  solutions  (such  as  Belkin  WeMo)  use 
handheld  mobile  devices  as  universal  remote  controls  to 
control  such  appliances.  In  these  solutions,  users  first 
browse  a  list  of  all  available  devices  and  then  call  up 
a  device-specific  user  interface.  This  method  faces  two 
challenges:  naming  and  scoping.  Assigning  clear  names 


Figure  1.  Using  an  augmented  head-worn  device  (1),  users  can  con¬ 
trol  smart  home  appliances  (2)  with  head  orientation  targeting.  A 
near-eye  display  then  shows  an  appliance  control  UI  (3),  which 
users  navigate  through  multitouch  gestures. 


is  non-trivial.  In  shared  spaces,  the  person  trying  to 
control  the  device  might  not  be  the  one  that  named 
it  -  e.g.,  while  an  office  building  manager  may  know 
what  "Light  4  in  area  E"  corresponds  to,  an  occupant 
may  not.  Second,  without  a  method  of  scoping  selec¬ 
tion  to  automatically  filter  non-relevant  devices,  paging 
though  long  lists  of  names  or  navigating  hierarchies  be¬ 
comes  potentially  more  cumbersome  than  the  physical 
action  the  "convenient"  software  solution  was  meant  to 
replace. 

To  address  these  challenges,  research  has  introduced 
techniques  of  augmenting  mobile  devices  with  acces¬ 
sories  like  laser  pointers  to  enable  direct  aiming  at  tar¬ 
get  devices  [2,  15].  While  promising,  some  drawbacks 
of  using  handheld  devices  are  that  the  device  first  has 
to  be  retrieved  (e.g.,  from  a  pocket)  and  aimed;  that 
two  hands  may  be  necessary  for  operation  (one  to  hold 
the  device,  one  to  operate  the  touch  screen);  and  that 
the  user's  visual  attention  is  now  split  between  looking 
down  at  a  screen  and  out  at  the  device  to-be-controlled. 

In  this  paper,  we  introduce  a  novel  method  for  select¬ 
ing  and  controlling  smart  appliances  in  physical  spaces 
through  the  use  of  a  head-worn  computing  device  with 
near-eye  display  and  wireless  communication.  We  aug¬ 
ment  Google  Glass ^  with  custom  hardware  for  this  pur- 

^http: //www. google . com/glass/start/ 
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pose.  Users  first  look  in  the  direction  of  the  appli¬ 
ance  they  wish  to  control  to  initiate  interaction  (e.g.,  at 
a  lamp  to  control  lighting,  or  at  a  speaker  to  change 
music  playback  volume).  If  multiple  appliances  fall 
within  communication  range,  a  disambiguation  tech¬ 
nique  that  combines  on-screen  information  as  well  as 
visual  feedback  on  the  target  appliances  lets  users  se¬ 
lect  their  desired  target.  Once  acquired,  an  appliance 
specific  control  UI  shown  on  the  head-mounted  display 
enables  adjustment  of  discrete  and  continuous  parame¬ 
ters  through  a  touchpad  interface  (Figure  1). 

Our  hardware  relies  on  infrared  (IR)  communication  be¬ 
tween  Glass  and  target  appliances  to  establish  a  connec¬ 
tion;  and  on  wireless  802.15.4  radio  communication  to 
exchange  control  messages.  Glass  is  augmented  with 
a  narrow-beam  IR  emitter  and  a  802.15.4  radio.  Target 
appliances  similarly  have  IR  receivers  and  radios.  This 
combination  enables  users  to  initiate  interaction  by  ori¬ 
enting  their  head;  but  once  initiated,  users  are  free  to 
look  away  from  the  target  appliances  while  issuing  con¬ 
trol  commands. 

While  prior  work  has  tended  to  focus  on  proofs-of- 
concept,  we  also  contribute  empirical  data  on  the  sys¬ 
tem  performance,  usability,  and  user  experience  of 
head-orientation  targeting  and  device  control.  We  first 
report  measurements  of  range  and  beam  characteristics 
of  our  controller.  We  then  conduct  a  study  with  14  par¬ 
ticipants  that  compares  acquisition  times  for  physical 
targets  in  a  room  for  our  technique  and  an  alternative 
list  selection  interface.  We  find  that  target  acquisition 
through  head  orientation  is  preferred  by  users  and  is 
faster  than  list  selection,  given  the  constraints  of  linear 
input  using  a  head-worn  touch  controller.  We  also  re¬ 
port  qualitative  results  from  participants  who  use  our 
system  for  home  automation  tasks. 

RELATED  WORK 

Relevant  prior  work  exists  in  the  areas  of  remote  con¬ 
trol  of  physical  appliances,  evaluations  of  pointing  in 
physical  space  and  augmented  reality  applications.  We 
discuss  each  in  turn. 

Remote  Control  of  Physical  Appliances 

Standard  infrared  remote  controls  for  televisions  and 
AV  equipment  are  only  meant  to  control  a  single  device. 
These  controllers  tend  to  use  wide-angle  infrared  LEDs. 
Universal  remote  controls  are  available  as  dedicated  de¬ 
vices  (e.g,  Crestron^)  or  applications  for  smart  phones 
(e.g.,  Belkin  WeMo^).  They  do  not  offer  spatial  selec¬ 
tion  of  target  devices,  forcing  users  to  browse  through 
lists  of  pre-configured  devices  instead.  Rukzio  found 
that  users  strongly  preferred  either  touching  a  mobile 
device  to  a  target  appliance  or  pointing  at  a  distance  to 
list  browsing  [13]. 


^http : //www. crest ron . com 
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Several  approaches  to  spatial  selection  with  handheld 
devices  exist  to  control  appliances  [2,  15,  19,  14]  or  to 
exchange  information  with  smart  infrastructure  sensor 
networks  [7, 9, 4].  Key  design  decisions  are  the  method 
by  which  a  target  device  is  selected;  and  the  method  by 
which  it  is  then  later  controlled  or  configured. 

In  several  techniques,  users  select  objects  of  interest 
with  laser  pointers.  The  laser  dot  provides  immedi¬ 
ate  visual  feedback  to  the  user  what  is  being  selected 
(though  it  does  not  indicate  whether  the  pointed-at 
object  can  indeed  be  controlled).  Furthermore,  laser 
pointer  becomes  obtrusive  when  there  are  other  peo¬ 
ple  in  the  space.  BeigTs  early  AIDA  handheld  combines 
laser  pointing  with  IR  communication  to  exchange  com¬ 
mands  [2].  Patel  extends  this  technique  by  modulat¬ 
ing  the  laser  light  to  communicate  the  controllers'  iden¬ 
tity  [15]  to  initiate  radio  communication.  These  proofs- 
of-concept  do  not  include  thorough  evaluations.  Kemp 
et  al.  use  a  laser  pointer  to  indicate  to  robots  which  item 
to  pick  up  in  a  room  [6]. 

The  XWand  [19]  determines  its  absolute  position  and 
orientation  and  uses  a  virtual  room  model  to  select  tar¬ 
get  devices.  Position  is  determined  through  two  ceiling- 
mounted  cameras;  orientation  is  determined  using  a 
built-in  IMU.  Users  can  employ  physical  gestures  or  ut¬ 
ter  speech  commands  to  control  selected  devices.  This 
technique  requires  room  instrumentation  and  an  up-to- 
date  virtual  model  of  device  locations.  The  Tricorder  [7] 
uses  IMU  orientation  coupled  with  room-localization 
based  on  received  signal  strength  indicators  (RSSI)  to 
estimate  what  a  user  is  pointing  at. 

Handheld  projectors  can  both  display  a  user  interface  in 
space  and  communicate  control  information  optically, 
e.g.,  by  encoding  information  temporally  (using  Gray 
codes  in  Picontrol  [14]  and  RFIG  [12])  or  spatially  (using 
QR  codes  in  the  infrared  spectrum  in  SideBySide  [18]). 
Printed  tags  like  QR  codes  can  also  be  affixed  to  devices 
and  read  by  cameras.  Common  tagging  systems  are  op¬ 
timized  to  be  read  from  a  close  distance,  though  it  is 
possible  to  redesign  codes  that  can  be  read  further  away 
(by  encoding  less  information)  [5]. 

Our  main  area  of  differentiation  is  that  we  employ  head 
orientation  as  the  selection  mechanism  instead  of  point¬ 
ing  —  the  user  looks  at  the  target  device  to  initiate  in¬ 
teraction.  Selection  techniques  with  very  small  selec¬ 
tors  such  as  laser  dots  are  less  appropriate  for  head- 
mounted  applications.  We  therefore  select  a  source  with 
a  wider  angle  of  illumination  (an  IR  LED),  but  restrict  its 
angle  to  be  narrower  than  in  general  purpose  IR  appli¬ 
cations. 

Evaluation  of  room-scale  selection 

Pausch  et  al.'s  early  investigation  of  head-mounted  dis¬ 
plays  compared  head-tracking  to  handheld  orientation 
control  for  a  target  acquisition  task  in  a  virtual  reality 
room  shown  on  a  head-mounted  display  [11].  They 
found  a  clear  performance  benefit  for  head-tracking. 
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On  the  other  hand.  Card  et  al.  experimentally 
determined  that  the  bandwidth  of  neck  muscles  is 
much  lower  than  that  of  arm,  wrist  or  finger  muscle 
groups  [3],  which  limits  the  performance  of  any  head 
orientation-based  interaction  scheme.  However,  many 
other  factors  such  as  device  characteristics  and  device 
acquisition  time  (e.g.,  pulling  a  phone  out  of  one's 
pocket)  contribute  to  overall  performance  and  prefer¬ 
ence  of  different  selection  techniques.  Compared  to  a 
screen  where  every  pixel  is  a  potential  target,  the  re¬ 
quired  accuracy  for  physical  device  selection  in  a  room 
is  much  lower,  and  head  orientation  may  provide  suffi¬ 
cient  accuracy.  Our  work  only  uses  head  orientation  for 
the  initial  selection  step;  since  we  believe  a  user's  atten¬ 
tion  is  often  drawn  to  the  objects  they  intend  to  interact 
with. 

Myers  et  al  compared  different  methods  of  interacting 
with  displays  at  a  distance  [10]  and  quantified  selec¬ 
tion  time  and  jitter  or  position  error  when  using  remote 
handheld  pointing.  Various  techniques  outperformed 
laser  pointers. 

Our  work  is  complementary  as  it  provides  concrete  per¬ 
formance  data  on  using  head  orientation  to  select  tar¬ 
gets  in  a  physical  environment. 


Figure  2.  Targeting  interaction:  when  users  turn  towards  a  control¬ 
lable  appliance  (A^B),  the  appliance  shows  immediate  visual  feed¬ 
back  (red  LED)  (B).  Users  confirm  that  they  wish  to  connect  to  this 
appliance  with  a  tap  (C)  which  triggers  connection  feedback  (blue 
LED)  on  the  appliance. 


Figure  3.  When  multiple  appliances  are  within  range,  they  all  have 
red  LEDs  illuminated  for  feedback  (A).  When  users  initiate  connec¬ 
tions,  all  target  appliances  toggle  on  blue  LEDs  while  the  currently 
selected  one  blinks  (B).  Swiping  on  the  touchpad  traverses  among 
responding  appliances  (C). 


Augmented  Reality  Interfaces 

Augmented  reality  applications  overlay  digital  infor¬ 
mation  and  graphics  on  the  real  world,  e.g.,  through 
head-mounted  displays  [1]  or  other  wearable  devices. 
Our  work  is  somewhat  orthogonal  to  the  research  fo¬ 
cus  of  this  field  as  our  device's  graphics  are  shown  in 
the  visual  periphery;  they  are  not  referenced  to  partic¬ 
ular  objects  in  the  world,  though  our  techniques  could 
be  extended  to  such  configurations. 

INITIATING  INTERACTION  THROUGH  ATTENTION 

This  section  describes  the  design  goals  of  our  sys¬ 
tem  and  their  realization  in  particular  interaction  tech¬ 
niques. 

Design  Goals 

Our  work  is  motivated  by  the  following  design  goals 
that  leverage  opportunities  of  head-worn  computing, 
but  also  acknowledge  potential  challenges: 

Leverage  visual  attention:  Take  advantage  of  the  fact 
that  visual  attention  can  express  intention  -  initiate  in¬ 
teraction  based  on  where  a  user  is  already  looking. 

Provide  immediate  feedback  about  selection  targets 
in  the  environment:  While  a  near-eye  display  can  push 
information  to  the  user,  users  don't  always  want  to  con¬ 
trol  an  object  simply  because  they  are  looking  at  it  (a 
problem  known  in  gaze-based  interaction  as  the  Mi¬ 
das  touch).  A  calmer  [17]  approach  is  to  locate  visual 
feedback  about  selection  targets  in  the  environment,  to 
prevent  distraction  and  interruption.  Such  feedback 
should  be  delivered  instantaneously,  while  users  look 
around  a  room. 


Offer  flexible  orientation  after  initiating  interaction: 

After  initiating  interaction  through  head  orientation, 
enable  the  user  to  reorient  their  head  or  body  position 
during  the  remaining  interaction  to  prevent  neck  strain. 

Offer  efficient  ways  to  disambiguate  orientation  in¬ 
put:  It  may  not  always  be  possible  to  identify  a  unique 
target  appliance  based  on  a  user's  attention  and  orien¬ 
tation.  Offer  ways  to  supplement  orientation-based  in¬ 
teraction  with  screen-based  interaction  to  provide  dis¬ 
ambiguation  information. 

These  design  goals  find  their  expression  in  the  follow¬ 
ing  interaction  model. 

Interaction  Flow 

Look:  Users  select  a  target  appliance  by  looking  in 
its  general  direction.  Glass  periodically  sends  a  de¬ 
vice  id  through  its  IR  emitter  analogous  to  Patel's  ap¬ 
proach  [15].  Target  appliances  have  IR  receivers  and 
offer  immediate  visual  feedback  by  toggling  a  red  LED 
whenever  a  valid  id  is  received  (Figure  2B).  This  enables 
scanning  the  environment  with  one's  gaze  to  see  which 
appliances  can  be  controlled. 

Initiate:  Users  confirm  their  desire  to  connect  to  an  ap¬ 
pliance  by  tapping  on  the  Glass  touchpad.  After  they 
are  connected,  the  target  appliance  toggles  on  a  blue 
LED  as  visual  feedback  (Figure  2C).  The  next  section  on 
disambiguation  deals  with  cases  in  which  multiple  ap¬ 
pliances  received  valid  IR  signals.  At  this  point,  all  fur¬ 
ther  communication  switches  over  to  the  802.15.4  wire¬ 
less  network  so  that  line  of  sight  to  the  target  is  no 
longer  needed. 
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Figure  4.  Two  screenshots  of  UI  controls  for  lamp  and  video  player. 


Control:  Glass  displays  a  user  interface  for  parame¬ 
ters  of  the  chosen  appliance.  Upon  connection,  the  cur¬ 
rent  status  of  the  appliance  is  retrieved  by  Glass  and 
shown  on  the  Ul.  The  interface  is  controlled  with  the 
temple-mounted  touchpad  through  the  following  ges¬ 
ture  set:  tapping  toggles  discrete  parameters  (such  as 
power  for  a  lamp  as  Figure  4(A));  single  finger  swipe 
changes  between  available  parameters;  double  finger 
swipe  adjusts  continuous  parameters  (such  as  volume 
for  a  video  player  as  Figure  4(B)).  This  scheme  was  cho¬ 
sen  because  the  touchpad  is  only  comfortably  operable 
in  the  coronal  plane  (front  to  back)  but  not  in  the  sagittal 
plane  (up  and  down).  Control  commands  are  sent  over 
XBee  radios. 

Disengagement:  Users  stay  connected  to  the  last  se¬ 
lected  appliance  up  to  a  timeout  period.  During  that 
period,  users  can  disengage  through  down  swipes. 

Disambiguation 

Head  orientation  only  indicates  a  general  area  of  vi¬ 
sual  interest.  It  does  not  necessarily  match  gaze  orienta¬ 
tion  as  extra-ocular  muscles  can  move  the  eyes.  The  IR 
beam  of  our  device  also  has  a  certain  spread  (see  next 
section).  In  an  environment  dense  with  potential  tar¬ 
gets,  multiple  targets  could  be  within  range.  Users  can 
tell  when  multiple  feedback  LEDs  in  the  environment 
illuminate  (Figure  3A).  To  disambiguate,  users  can  ei¬ 
ther  move  to  adjust  their  head  position,  or,  alternatively, 
call  up  a  disambiguation  dialog  on  the  Glass  display. 
The  dialog  presents  a  list  filtered  to  only  those  appli¬ 
ances  that  are  within  IR  range,  while  appliances  also 
use  blue  LED  as  visual  cues:  all  responding  appliances 
light  up  LEDs  while  the  currently  selected  one  blinks 
(Figure  3B).  Users  navigate  the  list  using  the  touchpad 
(Figure  3C),  and  then  continue  their  interaction  as  de¬ 
scribed  above. 

HARDWARE  DEVICE 
Prototype  Implementation 

Our  prototype  consists  of  a  Google  Glass  Explorer  Edi¬ 
tion  head-worn  computing  device,  augmented  with  an 
infrared  emitter  that  is  mounted  on  the  frame,  pointing 
out  in  the  direction  of  the  wearer's  view  (Figure  5).  The 
IR  emitter  LED  is  mounted  in  an  opaque  hollow  tube, 
that  restricts  the  outgoing  angle  of  illumination. 

In  our  prototype.  Glass  communicates  over  Bluetooth 
to  an  additional  microcontroller  board  the  user  has  to 


Figure  5.  Our  augmented  Glass  prototype  has  a  frame-mounted  in¬ 
frared  emitter. 


infrared 

802.15.4 


Figure  6.  In  our  system  architecture,  selection  is  initiated  through 
infrared  but  confirmed  over  802.15.4.  This  permits  wearers  to 
move  their  head  freely  after  connecting  to  an  appliance.  In  the  re¬ 
search  prototype,  users  have  to  carry  an  additional  microcontroller 
board  that  marshals  messages  between  Glass'  Bluetooth  radio  an 
IR/802.15.4,  but  our  custom  hardware  could  also  be  integrated  into 
the  wearable  device. 


wear  (Atmel  ATMega256).  This  board  marshals  XBee 
to  Bluetooth  messages  in  both  directions  and  also  con¬ 
trols  the  IR  LED  mounted  on  the  Glass  frame  (Figure  6). 
This  architecture  was  mostly  chosen  for  reasons  of  ex¬ 
pediency.  We  selected  XBee  802.15.4  radios  to  avoid 
Bluetooth  wake  up  latencies  but  we  do  not  claim  opti¬ 
mality  for  our  design  decisions.  Future  head-mounted 
devices  could  clearly  integrate  IR  emitters;  the  choice 
of  local  wireless  technology  could  also  change.  In  par¬ 
ticular,  one  could  substitute  WiFi  modules  or  design  an 
all-Bluetooth  network. 


Device  Characterization 

We  determined  the  usable  range  and  accuracy  empiri¬ 
cally  with  one  IR  emitter  and  two  IR  receivers.  The  IR 
emitter  constantly  sent  out  an  id  signal.  The  receivers 
that  correctly  received  the  signal  turn  their  LED  on  for 
300  ms. 

We  placed  all  three  devices  at  the  same  height  with  clear 
line  of  sight.  The  IR  emitter  is  first  places  2  feet  away 
from  the  receivers.  The  receivers  were  moved  sideways 
apart  from  each  other  until  they  could  no  longer  receive 
stable  signals.  We  then  recorded  the  distance  of  the  two 
receivers  for  the  calculation  of  coverage  angles.  The 
steps  are  repeated  for  IR  emitters  in  different  distances 
(as  shown  in  Table  1).  We  then  repeated  measurements 
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target 

distance 


-2’  0’  2’ 


J 

v/ 

w 

0.5”  inset 


-2’  0’  2’ 


-2’  0’  2’ 


1 .0”  inset  1 .5”  inset 


Figure  7.  Different  IR  configurations  suggest  usable  beam  widths 
of  2  to  4  feet  and  distances  up  to  16  feet 


distance/  depth 

2' 

4' 

8' 

12' 

16' 

0" 

74° 

78° 

N/A 

N/A 

N/A 

0.5" 

60° 

48° 

28° 

22° 

16° 

1.0" 

46° 

36° 

26° 

18° 

10° 

1.5" 

36° 

32° 

18° 

14° 

6° 

Table  1.  Measured  IR  coverage  angles  ©  at  different  target  distances 
and  different  depths  of  IR  emitter  inside  shielding  tube. 

with  the  emitter  placed  at  various  depths  in  the  tube 
(see  Figure  7). 

In  summary,  our  measurements  suggest  that  IR  com¬ 
munication  can  be  targeted  to  an  area  about  2-4'  in  di¬ 
ameter,  up  to  16'  in  front  of  the  user.  These  values  are 
a  reasonable  match  for  selecting  appliances  in  a  room- 
size  environment.  A  wider  beam  would  lead  to  an  in¬ 
creased  chance  of  multiple  appliances  receiving  IR  sig¬ 
nals  simultaneously.  A  narrower  beam  will  make  tar¬ 
geting  more  challenging,  given  the  precision  constraints 
of  human  head  movement. 

PHYSICAL  TARGET  ACQUISITION  STUDY 

To  understand  the  accuracy  and  performance  of  head- 
orientation-based  selection  through  our  device,  we  car¬ 
ried  out  a  comparative  target  acquisition  study,  where 
participants  had  to  connect  to  wireless  nodes  dis¬ 
tributed  in  a  room  with  our  technique,  and  with  an  al¬ 
ternate  list  selection  approach. 

Apparatus 

In  an  indoor  environment,  10  wireless  nodes  are  spread 
across  a  room  at  various  heights  and  distances  (Fig¬ 
ure  8).  The  nodes  are  stand-ins  for  potential  smart  ap¬ 
pliances  and  have  all  relevant  functionality  for  target¬ 
ing  and  wireless  communication,  but  do  not  control  any 
actual  appliances.  Each  node  is  an  embedded  wireless 
system  with  a  microcontroller,  IR  receiver,  a  wireless 
XBee  radio,  and  three  status  LEDs  (Figure  9).  An  yellow 
LED  indicates  that  the  device  is  the  target  that  should  be 
selected  in  the  current  trial;  a  red  LED  lights  up  when¬ 
ever  the  device  receives  an  IR  signal  from  Glass;  a  blue 
LED  shows  when  participants  have  successfully  con¬ 
nected  to  a  device,  and  is  also  used  for  disambiguation 
when  multiple  targets  are  within  IR  range.  Next  to  each 


Figure  8.  In  the  targeting  study,  participants  had  to  find  and  se¬ 
lect  one  of  10  targets  in  a  lab  environment.  Targets  were  called  out 
by  number;  for  the  list  mode  condition,  participants  need  to  match 
numbers  to  letters. 


target,  a  paper  sheet  shows  a  number  and  letter  combi¬ 
nation,  which  is  used  for  uniquely  identifying  the  de¬ 
vice.  The  numbers  are  the  primary  identifiers,  ordered 
from  left  to  right  in  the  room.  This  ordering  makes  it 
easy  to  locate  them,  which  simulates  looking  towards 
an  appliance  with  a  well-known  location  in  a  room,  and 
minimizes  visual  search  time. 

Methodology 

In  our  within-subjects  design,  participants  performed 
15  target  acquisition  tasks  each  with  two  interaction 
styles.  In  the  infrared  mode  condition,  participants  used 
our  IR  targeting  approach;  in  the  list  mode  condition, 
participants  had  to  look  up  a  device's  letter  code  on 
the  printed  paper  next  to  the  device  and  then  select 
that  letter  code  from  a  list  displayed  on  their  Glass  de¬ 
vice.  The  list  was  navigated  with  swipe  motions  on  the 
Glass  touchpad.  For  each  task,  participants  started  at 
a  fixed  position  in  the  room.  The  experimenter  called 
out  a  number  and  simultaneously  started  a  timer.  Par¬ 
ticipants  then  had  to  find  the  corresponding  device  (by 
looking  for  its  printed  code).  In  the  infrared  mode,  par¬ 
ticipants  then  selected  and  acquired  the  target  by  aim¬ 
ing  the  IR  beam  at  the  target,  and  confirmed  their  se¬ 
lection  with  a  touch  pad  tap.  If  more  than  one  target 
was  within  range,  participants  had  to  either  use  the  dis¬ 
ambiguation  dialog  or  reposition  themselves.  In  the  list 
mode,  participants  had  to  read  the  letter  next  to  the 
number  and  then  select  that  letter  by  browsing  a  lin¬ 
ear  list  shown  in  their  Glass  display.  While  the  list  was 
alphabetized,  letter  arrangement  in  the  room  was  not. 


wireless  radio 


IR  Receiver 


Microcontroller 


Status  LEDs 


Figure  9.  An  example  node  from  the  targeting  study  —  we  con¬ 
structed  10  such  nodes  -  each  mounted  in  a  box. 
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This  design  required  participants  to  find  the  target  in 
the  room  before  starting  a  list  navigation  to  keep  visual 
search  times  similar  in  each  condition. 

Afterwards,  participants  completed  a  survey  that 
elicited  answers  to  Likert-scale  questions  as  well  as 
open-ended  answers  about  their  experience. 

Participants 

We  recruited  14  participants  from  our  institution.  13 
had  never  used  Glass  before.  4  wore  prescription 
glasses,  which  may  have  affected  their  task  perfor¬ 
mance  as  wearing  glasses  beneath  Glass  makes  it  more 
cumbersome  to  secure  the  position  of  Glass  and  to  ad¬ 
just  the  screen  to  the  optimal  angle.  Half  of  them  per¬ 
formed  infrared  mode  first  and  the  other  half  did  list 
mode  first. 


Measures 

The  main  measures  were  target  acquisition  time:  the 
time  required  to  identify,  select,  and  connect  to  a  wire¬ 
less  target  device;  and  user  preference:  which  interface 
users  preferred  for  the  task  after  completing  the  study. 


Results 

Performance  data 

The  time  to  complete  each  task  can  be  broken  down  into 
the  following  pieces: 


Unfrared  I' locate[~^ I' reorient][~^ I' disambiguate]  I'tap 


I'list  focate  fistnav  ^tap 


In  both  conditions,  participants  first  have  to  locate  the 
target  announced  by  the  experimenter  through  visual 
search  (tiocate)-  l^e  infrared  mode,  participants  may 
then  directly  confirm  their  selection  if  only  a  single  tar¬ 
get  was  selected  (ttap)-  However,  if  they  don't  imme¬ 
diately  receive  feedback  that  their  target  was  selected, 
or  if  multiple  targets  were  selected,  users  either  have 
change  their  position  or  head  orientation  (t reorient) 
they  have  to  step  through  the  on-screen  disambiguation 
dialog  {t disambiguate)-  the  list  mode,  participants  must 

scroll  through  the  list  to  find  the  desired  target  iden¬ 
tifier  (tiistnau)-  Thus,  infrared  will  show  a  performance 
benefit  if  {t reorient  I' disambiguate)  ^  Histnav  This  depends 

on  the  number  of  total  devices  in  the  environment  (in¬ 
creasing  tiig^riav)^  their  density  (which  will  increase 


I' disambiguate)- 


We  first  show  results  for  10  targets  and  then  discuss 
extrapolations  of  these  results.  Average  target  acquisi¬ 
tion  time  tjnfrared  6.67  seconds,  while  was  8.86 
seconds  (Figure  10 A).  This  difference  is  significant  (Stu¬ 
dent's  t-test,  t{279)  =  -3.81,  p  =  0.00017). 


To  further  understand  the  performance  gain  in  in¬ 
frared  mode,  especially  the  factor  t disambiguates  com¬ 

pare  selection  times  when  multiple  devices  are  targeted 
(and  disambiguation  is  required)  to  single-device  selec¬ 
tion  times  (Figure  lOB).  When  there  is  a  single  device. 


25-  mean:  6.67  mean:  8.86 

median:  5.77  median:  7.96 


20- 


IR  List 

(A) 


25-  mean:  9.16  mean:  6.40 

median:  7.67  median:  5.63 

•  • 

20- 


15- 


IR  multiple 


(B) 


IR  single 


Figure  10.  Boxplot  of  task  completion  times  for  the  comparison  be¬ 
tween  infrared  mode  and  /ist  mode  (A),  and  between  IR  multiple  re¬ 
sponses  cases  and  IR  single  response  cases  (B).  The  centers  of  boxes 
are  median  values,  while  white  dashed  lines  are  mean  values. 


12.5- 


2.5  -  ,  ,  ,  ,  ,  ,  ,  ,  ,  , 

01  23456789 

order  in  list 

Figure  11.  Times  taken  to  select  a  device  vs.  its  order  in  the  list.  The 
dotted  line  is  a  linear  fit  between  the  mean  times  and  device  orders. 
Two  horizontal  lines  of  mean  target  acquisition  times  in  infrared 
mode  are  also  annotated  for  comparison. 


t disambiguate  is  0  and  it  takes  6.40  seconds  (on  average) 
to  complete  the  connection.  When  multiple  devices  are 
in  range,  the  time  increases  to  9.16  seconds,  indicating 
2.76  seconds  required  to  disambiguate.  Though  it  takes 
significantly  longer  (f(19)  =  —2.7827,  p  =  0.012  using 
t-test)  in  the  multiple  case,  these  cases  made  up  only  10% 
of  total  infrared  trials. 

For  each  device,  tn^^riav  depends  on  their  relative  po¬ 
sition  in  the  list.  Figure  11  shows  the  time  it  takes  to 
select  a  device  (means  and  standard  deviations)  as  a 
function  of  its  list  position  -  the  trend  line  (dotted)  en¬ 
ables  extrapolation  to  estimate  at  what  number  of  de¬ 
vices  the  infrared  mode  interaction  techniques  will  out¬ 
perform  list  mode^.  From  the  figure,  we  can  see  that 
once  the  target's  order  has  increased  to  be  larger  than  6, 
the  average  tug^riav  for  that  target  would  be  larger  than 
l^reorient  +  t disambiguate-  We  expect  that,  when  the  number 
of  targets  keeps  increasing,  there  would  be  larger  time 
reduction  in  infrared  mode. 

^The  higher  mean  value  at  order  =  1  is  caused  by  one  out¬ 
lier  when  the  participant  tried  multiple  times  in  list  mode  to 
connect  to  the  right  target. 
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Likert  Ratings.Targeting  Methods 


Ease  of  connecting  in  IR  mode 
Ease  of  aiming  (IR  Mode) 
Ease  of  detecting  multiple  selection 
Ease  of  narrowing  down  selection 


Ease  of  connecting  in  List  Mode 
Cumbersomeness  to  navigate  the  list 


Figure  12.  Likert  scale  ratings  for  ease  of  use  of  aspects  of  the  tar¬ 
geting  task.  Error  bars  show  standard  error. 


Figure  13.  In  the  smart  home  scenario,  participants  completed  a 
series  of  appliance  control  tasks  in  a  simulated  living  room. 


Participants'  selection  errors  do  occur  in  both  condi¬ 
tions.  However,  error  rates  were  low  (1.1%  in  infrared 
mode,  and  2.9%  in  list  mode,  respectively).  This  pre¬ 
cludes  us  from  running  a  more  detailed  analysis. 

Preference 

Eleven  of  14  users  preferred  infrared  mode  over  list 
mode  (three  preferred  list,  one  was  undecided).  While 
both  interfaces  were  judged  similarly  on  overall  ease 
of  connecting,  list  navigation  was  also  perceived  to  be 
cumbersome  (see  Figure  12).  As  self-report  data  can 
easily  skew  positive  as  participants  try  to  please  exper¬ 
imenters,  we  also  asked  participants  to  elucidate  why 
they  preferred  one  interface  over  the  other. 

List  mode  had  certain  advantages:  It  was  judged  to  be 
more  accurate  and  predictable  as  there  was  always  ex¬ 
actly  one  device  selected  in  the  list  {"With  the  list  you 
never  have  to  worry  about  accidentally  picking  up  two  tar¬ 
gets").  Also,  it  did  not  require  a  clear  line  of  sight  to  the 
target  device  so  participants  did  not  have  to  move  from 
their  starting  position  {"The  shortcoming  of  the  IR  mode 
was  that  you  had  to  he  a  certain  distance  away  in  order  for  it 
to  detect  the  appliance"). 

On  the  other  hand,  list  mode  was  judged  to  be  more 
"annoying"  and  tedious.  The  temple-based  touchpad 
for  selection  was  difficult  to  use  for  a  participant  with 
long  hair:  "List  mode  was  physically  difficult  for  me  to  nav¬ 
igate,  since  my  long  hair  wasn't  tied  hack  and  it  kept  inter¬ 
fering  with  my  swiping."  Another  participant  also  com¬ 
mented  on  the  ergonomic  challenge  of  touchpad  use  on 
Glass:  "The  strength  of  the  IR  mode  was  that  I  didn't  have  to 
use  my  fingers  as  much  to  control.  If  the  items  were  spaced 
relatively  far  apart,  it  was  easy  to  select  a  specific  appliance." 

One  noted  benefit  of  infrared  mode  was  a  feeling  that  it 
was  "more  direct  [than  list  model",  allowing  users  to  focus 
on  the  targeted  objects  instead  of  the  screen.  One  sub¬ 
ject  called  it  "natural  to  interact  with  things  just  by  looking 
at  them".  Another  mentioned  that  "it's  really  convenient 
that  what  I'm  looking  at  is  what  I'm  targeting". 


A  few  perceived  weaknesses  of  infrared  mode  were  the 
necessity  to  move  the  head  in  order  to  control  a  device 
and  the  imperfect  mapping  of  gaze  to  target.  One  par¬ 
ticipant  said  that  it  was  "awkward  to  be  aiming  your  head 
at  things,  tweaking  back  and  forth  to  get  it  right".  Another 
noted  that  observing  the  head  movement  didn't  capture 
the  site  of  her  attention,  because  "eye  movement  is  an  im¬ 
portant  part  of  how  people  look  around".  Users  had  to  learn 
the  usable  angle  of  the  IR  emitter  before  they  became 
successful  at  controlling  the  devices:  "I  had  to  compen¬ 
sate  by  tilting  my  head  up  a  little  bit. " 

SMART  HOME  CONTROL  SCENARIO 

To  understand  how  our  device  could  be  used  to  interact 
with  smart  appliances,  we  also  asked  all  study  partic¬ 
ipants  to  work  through  a  concrete  scenario.  The  main 
goal  was  to  obtain  qualitative  feedback  on  the  usability 
and  utility  of  our  device  with  more  realistic  tasks. 

Methodology 

We  recreated  a  living  room  environment  that  had  three 
controllable  appliances:  a  fan,  a  lamp,  and  one  laptop 
functioning  as  a  video  player  (see  Figure  13).  The  fan 
and  lamp  had  binary  controls:  they  could  be  switched 
on  or  off.  The  laptop  had  multiple  parameterized  func¬ 
tions:  participants  could  start,  pause,  fast  forward, 
rewind,  and  adjust  volume. 

We  then  asked  users  to  work  through  the  following 
script  for  controlling  the  room  for  watching  a  movie  in 
the  evening: 

1.  Turn  off  the  lights  as  you  want  to  watch  the  movie  in  a  darkened 
room. 

2.  You  feel  a  little  hot  in  the  room,  so  you  turn  on  the  fan. 

3.  You  connect  to  the  Smart  TV  and  start  playing  the  movie. 

4.  The  volume  seems  too  soft  to  hear  over  the  fan-  turn  it  up  a  bit. 

5.  After  a  while,  you  want  to  take  a  break  to  get  a  snack.  Pause  the 
movie. 

6.  When  you  come  back,  you've  forgotten  what  was  said  last  -  rewind 
by  30  seconds  and  restart  the  movie. 
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Likert  Ratings:  Home  Control  Scenario 

Ease  of  connecting 
Ease  of  control:  Lamp 
Ease  of  control:  Fan 
Ease  of  Control:  TV 

1  2  3  4  5  6  7 

Strongly  Neutral  Strongly 

disagree  agree 

Figure  14.  Likert  scale  ratings  for  ease  of  use  of  aspects  of  the  smart 
home  control  scenario.  Error  bars  show  standard  error. 


7.  When  the  credits  roll,  you  stop  the  movie  and  turn  the  lights  back 
on. 


8.  After  awhile,  you  turn  off  the  fan  and  leave  the  room. 


For  this  study,  we  only  elicited  subjective  data  in  the 
form  of  Likert  data  and  open-ended  responses. 

Results 

All  participants  successfully  completed  the  list  of  tasks. 
They  commented  positively  on  the  universal  remote 
control  functionality  (e.g.,  '7  didn't  have  to  search  for 
different  remote  controllers  for  different  appliances")  and 
stated  it  was  easy  to  target  and  connect  to  appliances, 
in  line  with  the  findings  of  the  previous  study  proce¬ 
dure.  Participants  saw  benefits  of  the  device  for  families 
—  It  might  also  be  useful  for  people  who  need  to  take  care 
of  small  children  that  they  can  complete  all  the  tasks  while 
keeping  an  eye  on  their  children  at  the  same  time",  though 
settings  that  require  more  movement  than  watching  a 
movie  at  home  (e.g.,  cooking)  may  be  more  appropriate 
scenarios  for  wearing  the  device. 

Simple  and  Complex  Controls 

Participants  rated  the  ease  of  control  of  particular  appli¬ 
ances  differently.  Ease  of  use  ratings  were  higher  for  the 
lamp  and  fan  which  had  simple,  discrete  on/ off  actions, 
and  lower  for  the  more  complex  movie  player  (see  Fig¬ 
ure  14).  Multiple  participants  remarked  that  the  diffi¬ 
culty  was  based  on  the  affordances  of  Glass:  "Most  of  the 
difficulty  I  had  with  Glass  came  from  having  to  navigate  the 
interface  on  the  tiny  screen  with  the  touch  pad".  The  screen 
size  and  (largely)  ID  input  put  a  limit  on  the  complex¬ 
ity  of  interfaces  that  can  be  presented.  As  one  partic¬ 
ipant  remarked:  "[The  media  player]  does  not  seem  to  be 
more  efficient  than  a  tablet  device."  The  difficulty  can  also 
partly  be  ascribed  to  our  interaction  design,  which  re¬ 
quired  one  finger  swipes  to  switch  between  parameters 
and  two  finger  gestures  for  adjusting  parameters  —  it 
was  hard  for  users  to  exert  fine  control  over  two-finger 
swipes.  In  addition,  users  did  not  always  remember 
these  mappings  as  they  are  not  yet  part  of  a  standard 
gesture  vocabulary. 

Eliminating  Steps 


Participants  liked  the  efficiency  of  our  design  but  also 
suggested  further  simplification  by  eliminating  the  ex¬ 
plicit  connection  step,  they  wanted  to  immediately  con¬ 
nect  to  any  appliance  that  receives  the  infrared  signal  — 
"I  intuitively  want  the  screen  to  automatically  appear  when 
the  IR  detects  the  appliance  rather  than  having  to  tap  to  con¬ 
nect.  "  Such  a  design  would  increase  the  efficiency  of  in¬ 
teraction,  but  at  a  power  tradeoff,  as  the  wireless  radio 
will  have  to  send  and  receive  data  each  time  the  user 
looks  at  a  device  —  whether  on  purpose  or  inadver¬ 
tently.  We  leave  the  study  of  battery  life  implications 
of  interaction  design  choices  to  future  work. 

Feedback  On  Screen  or  In  The  World 
We  found  out  that  the  near-eye  display  may  occlude  or 
overlap  the  target  appliance  when  a  participant  looks 
at  a  target.  This  may  make  it  difficult  to  read  either  the 
on-screen  display  or  see  information  displayed  on  the 
target  device.  As  one  participant  remarked  "This  was  es¬ 
pecially  annoying  with  the  TV  because  there  were  two  screens 
overlapping  each  other."  While  it  is  possible  to  look  away 
once  a  device  has  been  acquired  to  better  see  the  Glass 
display,  a  tension  remains  whether  users  should  rely  on 
feedback  from  the  appliances  themselves  or  on  the  near¬ 
eye  display. 

DISCUSSION 

Our  study  procedures  demonstrated  that  users  can  suc¬ 
cessfully  select  and  control  smart  appliances  with  head- 
worn  infrared  targeting,  and  that  this  technique  outper¬ 
forms  list  selection  on  the  Google  Glass  wearable  de¬ 
vice.  In  this  section,  we  revisit  some  of  the  results  and 
observations  and  discuss  their  larger  significance  and 
potential  paths  for  future  work. 

Meaningful  Results? 

While  the  performance  increase  in  our  targeting  study 
is  statistically  significant,  readers  may  wonder  whether 
it  is  truly  meaningful.  We  believe  it  is,  for  two  reasons: 
first,  our  technique  avoids  the  problems  of  naming  and 
scoping  inherent  in  any  interface  that  uses  representa¬ 
tions  of  objects  (e.g.,  a  list  of  identifiers)  rather  than  the 
objects  themselves.  We  argue  that  this  disintermedia¬ 
tion  of  interaction  leads  to  a  cognitively  simpler  design. 
Second,  our  existing  study  only  showed  results  for  a 
modest  number  of  targets.  Our  technique  should  have 
a  wider  margin  as  the  number  of  targets  increases.  Of 
course,  the  number  of  targets  one  can  realistically  ex¬ 
pect  may  vary  across  application  domains. 

Hands-Free  Operation 

While  head  movement  is  not  as  precise  as  hand  posi¬ 
tioning  (e.g.,  in  Patel's  mobile  laser  pointing  [15]),  one 
key  benefit  of  our  target  acquisition  step  is  that  it  does 
not  require  the  user's  hands.  This  raises  the  question  if 
the  rest  of  the  interaction  (disambiguation  and  device 
control)  could  also  be  achieved  in  a  hands-free  fash¬ 
ion.  Voice-command  control  is  an  obvious  candidate, 
though  such  approaches  have  not  found  widespread 
adoption  because  of  social  acceptability  and  other  fac¬ 
tors. 
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Hardware  Limitations 

In  our  prototype,  the  IR  emitter  is  fixed  in  a  single  po¬ 
sition.  Due  to  different  sizes  and  shapes  of  the  users 
heads,  the  emitter  may  not  line  up  exactly  with  their 
head  orientation.  An  adjustable  emitter  (paired  with  a 
suitable  calibration  routing)  would  improve  the  perfor¬ 
mance  of  our  design.  Some  of  our  users  also  mentioned 
that  they  had  to  move  closer  to  some  targets  to  success¬ 
fully  select  them.  A  stronger  emitter  could  overcome 
these  problems,  but  care  has  to  be  taken  to  avoid  pos¬ 
sible  reflection  problems  where  IR  light  bounces  off  a 
wall  and  hits  an  unintended  target  behind  the  user. 

More  importantly  though,  the  main  limitation  of  our 
design  is  that  extra  hardware  for  infrared  communi¬ 
cation  is  needed  for  the  head-mounted  device  and 
each  controllable  appliance.  One  potential  approach 
to  sidestep  this  requirement  would  be  to  combine  the 
growing  availability  of  high-resolution  indoor  maps 
with  live  data  from  the  point-of-view  camera  on  the  de¬ 
vice  to  determine  what  a  user  is  looking  at  without  any 
infrared  data  exchange. 

The  Midas  Look 

Our  participants  suggested  eliminating  explicit  initia¬ 
tion  of  a  connection  by  the  user.  However,  one  of  the  de¬ 
sign  guidelines  for  near-eye  displays  is  to  avoid  push¬ 
ing  information  to  the  display  without  an  initial  re¬ 
quest  from  the  user  —  flashing  device  information  on 
screen  each  time  a  user  moves  their  head  would  surely 
be  counterproductive.  Future  work  should  investigate 
how  to  intelligently  decide  when  and  how  to  initiate  in¬ 
teraction  for  the  user. 

Where  is  the  Target? 

One  open  design  question  of  our  approach  is  where  in¬ 
frared  receivers  should  be  placed.  For  a  light,  one  might 
put  a  received  on  the  light  itself,  or  on  the  light  switch, 
to  cater  to  existing  expectations.  For  volume  control, 
the  infrared  receiver  might  be  located  on  a  speaker  or 
on  the  amplifier.  A  thorough  study  of  user  preferences 
would  be  interesting;  though  we  also  point  out  that  our 
architecture  could  easily  support  multiple  receivers  that 
end  up  controlling  the  same  appliance. 

CONCLUSION 

We  introduced  a  novel  method  for  selecting  and  con¬ 
trolling  smart  appliances  in  physical  spaces  through  in¬ 
frared  targeting  based  on  head  orientation.  Through 
our  solution,  we  attempt  to  address  the  naming  and 
scaling  challenges  faced  by  handheld  mobile  devices. 
The  design  takes  advantage  of  the  fact  that  visual  at¬ 
tention  can  express  intention.  The  visual  feedback  pro¬ 
vided  by  the  target  appliances  helps  users  keep  their 
focus  in  the  physical  world.  While  we  present  a  pro¬ 
totype  approach  that  requires  that  the  user  carry  ad¬ 
ditional  hardware,  all  parts  can  readily  be  miniatur¬ 
ized  and  integrated  into  future  head-worn  hardware. 
We  also  introduced  a  disambiguation  technique  in  case 
head  orientation  is  not  sufficient  to  determine  a  unique 


target.  We  characterized  our  devices  performance,  ar¬ 
guing  that  it  is  matched  well  to  the  amount  of  head 
movement  people  can  control  without  strain.  A  target 
acquisition  study  showed  that  the  technique  is  efficient; 
a  home  control  scenario  showed  promise  but  also  limi¬ 
tations  when  trying  to  control  complex  appliances.  As 
our  environment  continues  to  be  populated  by  a  swarm 
of  sensing  and  actuation  devices,  methods  to  interro¬ 
gate  and  control  our  smart  environments  will  become 
increasingly  important. 
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